Application of Clustering in the Non-Parametric Estimation of Distribution Density

نویسندگان

  • T. Ruzgas
  • R. Rudzkis
  • M. Kavaliauskas
چکیده

Abstract. This paper discusses a multimodal density function estimation problem of a random vector. A comparative accuracy analysis of some popular non-parametric estimators is made by using the Monte-Carlo method. The paper demonstrates that the estimation quality increases significantly if the sample is clustered (i.e., the multimodal density function is approximated by a mixture of unimodal densities), and later on, the density estimation methods are applied separately to each cluster. In this paper, the sample is clustered using the Gaussian distribution mixture model and the EM algorithm. The highest efficiency in the analysed cases was reached by using the iterative procedure proposed by Friedman for estimating a density component corresponding to each cluster after the primary sample clustering mentioned. The Friedman procedure is based on both the projection pursuit of multivariate observations and transformation of the univariate projections into the standard Gaussian random values (using the density function estimates of these projections).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hyperbolic Cosine Log-Logistic Distribution and Estimation of Its Parameters by Using Maximum Likelihood Bayesian and Bootstrap Methods

‎In this paper‎, ‎a new probability distribution‎, ‎based on the family of hyperbolic cosine distributions is proposed and its various statistical and reliability characteristics are investigated‎. ‎The new category of HCF distributions is obtained by combining a baseline F distribution with the hyperbolic cosine function‎. ‎Based on the base log-logistics distribution‎, ‎we introduce a new di...

متن کامل

Semi-parametric Quantile Regression for Analysing Continuous Longitudinal Responses

Recently, quantile regression (QR) models are often applied for longitudinal data analysis. When the distribution of responses seems to be skew and asymmetric due to outliers and heavy-tails, QR models may work suitably. In this paper, a semi-parametric quantile regression model is developed for analysing continuous longitudinal responses. The error term's distribution is assumed to be Asymmetr...

متن کامل

تخمین احتمال بزرگی زمین‌لغزش‌های رخ‌داده در حوزه آبخیز پیوه‌ژن (استان خراسان رضوی)

Knowing the number, area, and frequency of landslides occurred in each area has a prominent role in the long-term evolution of area dominated by landslides and can be used for analyzing of susceptibility, hazard, and risk. In this regard, the current research is trying to consider identified landslides size probability in the Pivejan Watershed, Razavi Khorasan Province. In the first step, lands...

متن کامل

A robust wavelet based profile monitoring and change point detection using S-estimator and clustering

Some quality characteristics are well defined when treated as response variables and are related to some independent variables. This relationship is called a profile. Parametric models, such as linear models, may be used to model profiles. However, in practical applications due to the complexity of many processes it is not usually possible to model a process using parametric models.In these cas...

متن کامل

Moment Inequalities for Supremum of Empirical Processes of‎ ‎U-Statistic Structure and Application to Density Estimation

We derive moment inequalities for the supremum of empirical processes of U-Statistic structure and give application to kernel type density  estimation ‎and estimation of the distribution function for functions of observations.  

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006